print("Districts: ", raw_data["District"].value_counts().shape[0],
" | Blocks: ", raw_data["Block"].value_counts().shape[0],
" | Gram Panchayats: ", raw_data["GP ID"].value_counts().shape[0],
" | Villages: ", raw_data["Village ID"].value_counts().shape[0],
" | Surveys: ", raw_data["Village"].count())
display_barh(raw_data['District'].value_counts(),title="Number of Surveys in Each District", size=[6,6])
hh_no_cnsnt = raw_data[raw_data["I have consent from family head/ adult member to proceed with the survey."] == "No"]
hh_no_cnsnt.shape[0]
### Data Error Condition 1: Number of Duplicate Records identified using Duplicate Key above
dup_records = raw_data[raw_data["is_duplicate_record"] == True]
dup_records.shape[0]
Who Submitted the Duplicate Records-
dup_records['Volunteer Name'].value_counts()
Removing the duplicate records from analysis...
Number of unique records:
raw_data.drop_duplicates(subset=("District",
"Block",
"Gram Panchayat",
"Village",
"Household number"),
keep='last',
inplace=True)
raw_data.shape[0]
no_cnsnt_col="I have consent from family head/ adult member to proceed with the survey."
hh_no_cnsnt = raw_data[raw_data[no_cnsnt_col] == "No"]
hh_no_cnsnt.shape[0]
display_bar(hh_no_cnsnt["District"].value_counts(),title="Number of Households not Surveyed", size=[14,3])
raw_data["Why did you not get permission to do the survey?"].value_counts()
display_donut(raw_data["Why did you not get permission to do the survey?"],
title='Why did you not get permission to do the survey?',
width=5,
height=5,
pct=True)
List of Villages with number of households where survey could not be conducted-
create_download_link(hh_no_cnsnt_pvt,
title='Click to Download',
filename='List of Households - Survey not conducted.csv',
level='Village')
Removing the records without consent from analysis
Number of records with consent-
#raw_data=raw_data.drop(raw_data[raw_data["I have consent from family head/ adult member to proceed with the survey."] == "No"].index)
value.append('Total Households')
print(value)
orig_hh_cnt=raw_data.pivot_table(index=['District','Block','Gram Panchayat','Village'],
values=value,
aggfunc=np.sum) #Create Copy of Original Household Count
raw_data=raw_data[raw_data[no_cnsnt_col]=="Yes"] #Keep records only with consent
raw_data.shape[0]
Total Households included in Survey:
total_hh = raw_data.shape[0]
print(total_hh)
Households by Social Category:
Distribution of Households across different socio-economic categories in district
raw_data.pivot_table(index=["District"],
columns=["Caste (Avoiding asking. Ask only if doubtful)"],
values=['Total Households'],
aggfunc={'Total Households':[np.sum]}).style.apply(highlight_max,axis=1)
Distribution of Villages by % of Households having TBR
Why there was never a toilet in household?
print("Households which never had a toilet:",hh_by_tbr.loc['No, never had TBR','Total Households'] )
display_donut(raw_data['Why did you never have a Toilet?'],
title='',
width=6,
height=6,
pct=True)
When TBR is now not available in household, but was there earlier, what happened to it?
display_barh(raw_data['What happened to your TBR?'].value_counts(),
title='Current Status of TBR',
size=[7,4])
So what is the common practice for defecation in households?
display_donut(raw_data['You do not have a Toilet. Where do you defecate?'],
title='',
width=4,
height=4,
pct=True)
Which are the villages where open defecation is widespread (Top 20) ?
col='You do not have a Toilet. Where do you defecate?'
od_hh_dist=raw_data.pivot_table(index=['District','Block','Gram Panchayat','Village'],
values=value,
aggfunc=np.sum)
od_hh_dist.fillna(0,inplace=True)
od_hh_dist.sort_values(value[0], ascending=False).head(20)
All Villages where ODF is in practice-
create_download_link(od_hh_dist,filename="Distribution of OD Households.csv", level='Village')
Households Connected to Water Suppy
Observations: 25.38% of the households do not have water connection. Out of those 94.57% (i.e. 24% of total) never had the connection
Distribution of Villages by % of Households Not having Water Suppply Connection
Why did households never have tap connection?
Distribution of households (by %) by major reasons for not getting water supply connection:
print("Total Households which never connected to WSS: ",hh_by_ws.iloc[1,0],"(",hh_by_ws.iloc[1,1],"%)")
a. Across all regions -
value = [col+"-"+x for x in options]
hh_nvr_cnctd_sw = raw_data[raw_data["Did you ever have tap connection (individual pipeline) to your house/ TBR?"]=="No"].pivot_table(index='State',
values = value,
aggfunc=np.mean).round(2)
display_bar(hh_nvr_cnctd_sw[value],size=[15,9],title='')
b. In districts
hh_nvr_cnctd_dw = raw_data[raw_data["Did you ever have tap connection (individual pipeline) to your house/ TBR?"]=="No"].pivot_table(index='District',
values = value,
aggfunc=np.mean).round(2)
display_bar(hh_nvr_cnctd_dw[value],size=[15,10],title='')
c. In Villages
### Create a Pivot % of such households not receiving water village wise along with reason
hh_nvr_cnctd_vw = raw_data.pivot_table(index = ['District','Block','Gram Panchayat', 'Village'],
values = value,
aggfunc = np.mean
).round(2)
create_download_link(hh_nvr_cnctd_vw,title="Download Complete Table",filename="Reasons for No WSS Connection.csv", level='Village')
What is the willingness of households to reconnect the water supply?
Among those households which had a tap connection earlier but not at present
print("Total Households which wever once connected to WSS but not now: ",hh_by_ws.iloc[2,0],"(",hh_by_ws.iloc[2,1],"%)")
*Among those households which never had a tap connection***
print("Total Households which never connected to WSS: ",hh_by_ws.iloc[1,0],"(",hh_by_ws.iloc[1,1],"%)")
Condition of tap connection (pipeline to house/ TBR)
*Observations: 9.26% of the household tap connections are not working*
Supply of water to households through pipeline
How often do the households get the water? (Number of households with get water regularlrly)
Overall Status of Water Supply System in the Village
Distribution of villages by frequency of water supply to households (%)
How often the given % of connected households get the water supply in the village?
create_download_link(df=wss_connection_status,
filename='List of Villages - % of Households by Frequency of Water Supply.csv',
title='Download Complete List',
level='Village')
Water shortage month-wise (% of households having a water supply which do not get water in each month)
print("Total Households which get water supply: ",raw_data[raw_data['Do you have water supply to your house/TBR ?']=='Yes'].shape[0])
Why do households do not get 24x7 water supply?
Does the entire village get water supply?
Observation: a. Villages where more than 20% of the connected households do not get water:
b. Villages where over 50% of connected households do not get water:
c. Village where 100% of households receive water:
d. Village where no (0%) household receive water:
create_download_link(df=villages_with_no_supply,title="Click to Download",
filename='List of Villages-% of HH with Water Supply.csv',
level='Village')
Why is the water not being supplied?
Distribution of hoseholds(%) by major reasons for no supply of water:
a. Across all regions -
b. In districts
c. In Villages (Top 50)
Cells in red denote the village where the issue is most common
Frequency Distribution of Villages by reason for households(%) not being connected to Water Supply System there
Why given % of households have not been connected to the water supply systsme in the village?
Households having 3rd Tap Connection
Households which opted for TBR and/or 3rd Tap Connection
Households which opted for Water Supply Connection and/or 3rd Tap Connection
What is the Condition of the 3rd Tap installed in Households?
Where have people installed the 3rd Tap preferably?
Where do people get drinking water from?
% of Households reporting different sources of drinking water
Overall
By Connection to Water Supply
By Availability of 3rd Tap
*By District
hh_drnkng_wtr_src = raw_data.pivot_table(index='District',
values = value,
aggfunc=np.mean).round(2)
display_bar(hh_drnkng_wtr_src,size=[16,8],title='')
create_download_link(df=villages_with_no_supply_rsn,title="Download Complete List",
filename='List of District-Major Sources of Water for Households.csv',
level='District')
By Village
How is waste water from bathroom disposed?
Number of households actually having waste water disposal system
Where is waste water from the bathroom disposed?
Is the water water flowing out of bathroom properly disposed in a sewer/kitchen/garden/pit?
Does anyone in family use the toilet?
% of households where toilet is used by any of the family member
By Availability of TBR
Why does the family not use the toilet
Reason for not using the toilet (by % of Households)
a. Overall
b. District Wise
c. Village Wise
Although the households do not use toilet primarily due to lack of water, but what is the status of water supply in such households which are connected with water supply
raw_data[raw_data['Why dont they use the Toilet? (1)-No water']==100]['How often do you get water?'].value_counts()
hh_tlt_no_usg_by_ws = raw_data[raw_data[par_col]=="No"].pivot_table(index=['How often do you get water?'],
values = value,
aggfunc=np.mean).round(2)
hh_tlt_no_usg_by_ws
On which occasions do the households not use the toilet?
a. Overall
col="What are the occasions family members not use the Toilet?"
options=get_options(col)
decompose_multiselect_answers_normalized(col,options)
value=[col+"-"+x for x in options]
hh_tlt_no_use_occ = raw_data.pivot_table(index='State',
values=value,
aggfunc=np.sum)
display_bar(hh_tlt_no_use_occ,title='',size=[16,9],pct=True)
b. In Districts
hh_tlt_no_use_occ_dw = raw_data.pivot_table(index='District',
values=value,
aggfunc=np.sum)
display_bar(hh_tlt_no_use_occ_dw,title='',size=[16,9],pct=False)
Does anyone in family use the bathroom?
% of households where bathroom is used by any of the family member
By Availability of TBR
Why does the family not use the bathroom
Reason for not using the bathroom (by % of Households)
a. Overall
b. District Wise
c. Village Wise
Although the households do not use bathroom majorly due to lack of water, but what is the status of water supply in such households
raw_data[raw_data['Why do they not use the bathroom?-No water']==100]['How often do you get water?'].value_counts()
hh_bth_no_usg_by_ws = raw_data[raw_data[par_col]=="No"].pivot_table(index=['How often do you get water?'],
values = value,
aggfunc=np.mean).round(2)
hh_bth_no_usg_by_ws
Summary of Responses on TBR Usage in households, across different groups ad gender
hh_t_b_usg
TBR Usage in households, across different groups ad gender
Status of Water Supply and TBR in Villages
hh_wss_tbr_staus = raw_data.pivot_table(index=['District','Block','Gram Panchayat', 'Village'],
values = ['Total Households',
"Caste (Avoiding asking. Ask only if doubtful)-SC",
"Caste (Avoiding asking. Ask only if doubtful)-ST",
"Caste (Avoiding asking. Ask only if doubtful)-OBC",
"Caste (Avoiding asking. Ask only if doubtful)-General",
'Do you have TBR?-Yes, I have TBR',
'Do you have TBR?-Toilet only',
'Do you have TBR?-No',
'Did you ever have TBR?-No',
'Do you NOW have tap connection (individual pipeline) to your house/ TBR?-Yes',
'Do you NOW have tap connection (individual pipeline) to your house/ TBR?-No',
'Do you have water supply to your house/TBR ?-Yes',
'Do you have water supply to your house/TBR ?-No'],
aggfunc=np.sum)
Household Survey Overview
create_download_link(hh_wss_status,title='Click to Download', filename='Household Survey Overview.csv',level='Village')